56 research outputs found

    Prosody and intrasyllabic timing in French

    Get PDF
    Durational variation associated with accentuation and final lengthening is examined in a corpus of articulatory data for French. Both factors are associated with measurable differences in acoustic duration. However two different articulatory strategies are employed to make these contrasts although both result in superficially longer and more displaced gestures.Parts of this research were supported by the National Science Foundation (USA) under Grant no. IRI-8858109 to Mary Beckman, the Ohio State University, and by the National Institutes of Health (USA) under Grant no. NS-13617 to Haskins Laboratories

    Multisensory Integration Sites Identified by Perception of Spatial Wavelet Filtered Visual Speech Gesture Information

    Get PDF
    Perception of speech is improved when presentation of the audio signal is accompanied by concordant visual speech gesture information. This enhancement is most prevalent when the audio signal is degraded. One potential means by which the brain affords perceptual enhancement is thought to be through the integration of concordant information from multiple sensory channels in a common site of convergence, multisensory integration (MSI) sites. Some studies have identified potential sites in the superior temporal gyrus/sulcus (STG/S) that are responsive to multisensory information from the auditory speech signal and visual speech movement. One limitation of these studies is that they do not control for activity resulting from attentional modulation cued by such things as visual information signaling the onsets and offsets of the acoustic speech signal, as well as activity resulting from MSI of properties of the auditory speech signal with aspects of gross visual motion that are not specific to place of articulation information. This fMRI experiment uses spatial wavelet bandpass filtered Japanese sentences presented with background multispeaker audio noise to discern brain activity reflecting MSI induced by auditory and visual correspondence of place of articulation information that controls for activity resulting from the above-mentioned factors. The experiment consists of a low-frequency (LF) filtered condition containing gross visual motion of the lips, jaw, and head without specific place of articulation information, a midfrequency (MF) filtered condition containing place of articulation information, and an unfiltered (UF) condition. Sites of MSI selectively induced by auditory and visual correspondence of place of articulation information were determined by the presence of activity for both the MF and UF conditions relative to the LF condition. Based on these criteria, sites of MSI were found predominantly in the left middle temporal gyrus (MTG), and the left STG/S (including the auditory cortex). By controlling for additional factors that could also induce greater activity resulting from visual motion information, this study identifies potential MSI sites that we believe are involved with improved speech perception intelligibility

    Analysis and synthesis of talking faces by facial motion mapping

    No full text
    Natural face motion during speech is one of the most important factors in not only improving the realism of talking face animation but also in inducing correct perception of auditory information. In this paper, we propose a new technique called Facial Motion Mapping that maps human face motion data to any target character based on the similarity of deformation characteristics between the real people and the target characters. This technique requires only a set of face postures with the same mesh topology per set from each person or character, and it will map face motion to characters of different topologies according to the results of principal component analysis (PCA). Even a small number of mapping parameters controlling the deformation can create a realistic talking face animation using deformation features already contained in each set. We demonstrate this technique for a 3D human face, a 3D dog face, and a 2D cartoon face

    Video-based face motion measurement

    No full text
    In this paper, we describe and evaluate a noninvasive method of measuring face motion during speech production. Reliable measures are extracted from standard video sequences using an image analysis process that takes advantage of important constraints o

    Audiovisual Speech Processing

    No full text
    International audienceWhen we speak, we configure the vocal tract which shapes the visible motions of the face and the patterning of the audible speech acoustics. Similarly, we use these visible and audible behaviors to perceive speech. This book showcases a broad range of research investigating how these two types of signals are used in spoken communication, how they interact, and how they can be used to enhance the realistic synthesis and recognition of audible and visible speech. The volume begins by addressing two important questions about human audiovisual performance: how auditory and visual signals combine to access the mental lexicon and where in the brain this and related processes take place. It then turns to the production and perception of multimodal speech and how structures are coordinated within and across the two modalities. Finally, the book presents overviews and recent developments in machine-based speech recognition and synthesis of AV speech

    Estimation and animation of faces using facial motion mapping and a 3D face database

    No full text
    Realistic facial animation remains one of the major challenges in computer graphics. The first step in such animation is to acquire a realistic face model. Expensive scanning devices such as laser range finders are convenient for capturing realistic face models. However, as presented by Blanz and Vetter (1999), Blanz et al. (2001), and Hwang et al. (2000), a 3D face database is quite useful for creating models from existing face characteristics in the database. Once a database is established, almost any 3D face can be created from features extracted from photographs
    • …
    corecore